Active Sampling for Data Mining

نویسندگان

  • Emanuele Olivetti
  • Paolo Avesani
چکیده

Data mining is a complex process that aims to derive an accurate predictive model starting from a collection of data. Traditional approaches assume that data are given in advance and their quality, size and structure are independent parameters. In this paper we argue that an extended vision of data mining should include the step of data acquisition as part of the overall process. Moreover the static view should be replaced by an evolving perspective that conceives the data mining as an iterative process where data acquisition and data analysis repeatedly follow each other. A decision support tool based on data mining will have to be extended accordingly. Decision making will be concerned not only with a predictive purpose but also with a policy for a next data acquisition step. A successful data acquisition strategy will have to take into account both future model accuracy and the cost associated to the acquisition of each feature. To find a trade off between these two components is an open issue. A framework to focus this new challenging problem is proposed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gaussian Processes for Active Data Mining of Spatial Aggregates

Active data mining is becoming prevalent in applications requiring focused sampling of data relevant to a high-level mining objective. It is especially pertinent in scientific and engineering applications where we seek to characterize a configuration space or design space in terms of spatial aggregates, and where data collection can become costly. Examples abound in domains such as aircraft des...

متن کامل

Using a Data Mining Tool and FP-Growth Algorithm Application for Extraction of the Rules in two Different Dataset (TECHNICAL NOTE)

In this paper, we want to improve association rules in order to be used in recommenders. Recommender systems present a method to create the personalized offers. One of the most important types of recommender systems is the collaborative filtering that deals with data mining in user information and offering them the appropriate item. Among the data mining methods, finding frequent item sets and ...

متن کامل

Active Feature Selection Using Classes

Feature selection is frequently used in data pre-processing for data mining. When the training data set is too large, sampling is commonly used to overcome the difficulty. This work investigates the applicability of active sampling in feature selection in a filter model setting. Our objective is to partition data by taking advantage of class information so as to achieve the same or better perfo...

متن کامل

Mining basic active structures from a large-scale database

BACKGROUND The Pubchem Database is a large-scale resource for chemical information, containing millions of chemical compound activities derived by high-throughput screening (HTS). The ability to extract characteristic substructures from such enormous amounts of data is steadily growing in importance. Compounds with shared basic active structures (BASs) exhibiting G-protein coupled receptor (GPC...

متن کامل

Augmented Query Strategies for Active Learning in Stream Data Mining

Active learning is used in situations where the amount of unlabeled data is abundant but it is costly to manually label the data. So, depending on our available budget, from all unlabeled instances we are to select only a subset of them to ask the oracle for manual labeling. Thus, the query strategy, i.e., how relevant instances are selected to be sent to the oracle, plays an important role in ...

متن کامل

Automatic estimation of regularization parameter by active constraint balancing method for 3D inversion of gravity data

Gravity data inversion is one of the important steps in the interpretation of practical gravity data. The inversion result can be obtained by minimization of the Tikhonov objective function. The determination of an optimal regularization parameter is highly important in the gravity data inversion. In this work, an attempt was made to use the active constrain balancing (ACB) method to select the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004